Linear Regression

Class Reference

class pykitml.LinearRegression(input_size, output_size, reg_param=0)

Implements linear regression.

__init__(input_size, output_size, reg_param=0)
Parameters:
  • input_size (int) – Size of input data or number of input features.
  • output_size (int) – Number of categories or groups.
  • reg_param (int) – Regularization parameter for the model, also known as ‘weight decay’.
feed(input_data)

Accepts input array and feeds it to the model.

Parameters:input_data (numpy.array) – The input to feed the model.
Raises:ValueError – If the input data has invalid dimensions/shape.

Note

This function only feeds the input data, to get the output after calling this function use get_output() or get_output_onehot()

get_output()

Returns the output activations of the model.

Returns:The output activations.
Return type:numpy.array
train(training_data, targets, batch_size, epochs, optimizer, testing_data=None, testing_targets=None, testing_freq=1, decay_freq=1)

Trains the model on the training data, after training is complete, you can call plot_performance() to plot performance graphs.

Parameters:
  • training_data (numpy.array) – numpy array containing training data.
  • targets (numpy.array) – numpy array containing training targets, corresponding to the training data.
  • batch_size (int) – Number of training examples to use in one epoch, or number of training examples to use to estimate the gradient.
  • epochs (int) – Number of epochs the model should be trained for.
  • optimizer (any Optimizer object) – See Optimizers
  • testing_data (numpy.array) – numpy array containing testing data.
  • testing_targets (numpy.array) – numpy array containing testing targets, corresponding to the testing data.
  • testing_freq (int) – How frequently the model should be tested, i.e the model will be tested after every testing_freq epochs. You may want to increase this to reduce training time.
  • decay_freq (int) – How frequently the model should decay the learning rate. The learning rate will decay after every decay_freq epochs.
Raises:

ValueError – If training_data, targets, testing_data or testing_targets has invalid dimensions/shape.

plot_performance()

Plots logged performance data after training. Should be called after train().

Raises:
  • AttributeError – If the model has not been trained, i.e train() has not been called before.
  • IndexError – If train() failed.
r2score(testing_data, testing_targets)

Return R-squared or coefficient of determination value.

Parameters:
  • testing_data (numpy.array) – numpy array containing testing data.
  • testing_targets (numpy.array) – numpy array containing testing targets, corresponding to the testing data.
Returns:

r2score – The average cost of the model over the testing data.

Return type:

float

Raises:

ValueError – If testing_data or testing_targets has invalid dimensions/shape.

cost(testing_data, testing_targets)

Tests the average cost of the model on the testing data passed to the function.

Parameters:
  • testing_data (numpy.array) – numpy array containing testing data.
  • testing_targets (numpy.array) – numpy array containing testing targets, corresponding to the testing data.
Returns:

cost – The average cost of the model over the testing data.

Return type:

float

Raises:

ValueError – If testing_data or testing_targets has invalid dimensions/shape.

Example: Predicting Fish Length

Dataset

Fish Length - pykitml.datasets.fishlength module

Training Model

import pykitml as pk
from pykitml.datasets import fishlength

# Load the dataset
inputs, outputs = fishlength.load()

# Normalize inputs
array_min, array_max = pk.get_minmax(inputs)
inputs = pk.normalize_minmax(inputs, array_min, array_max)

# Create polynomial features
inputs_poly = pk.polynomial(inputs)

# Normalize outputs
array_min, array_max = pk.get_minmax(outputs)
outputs = pk.normalize_minmax(outputs, array_min, array_max)

# Create model
fish_classifier = pk.LinearRegression(inputs_poly.shape[1], 1)

# Train the model
fish_classifier.train(
    training_data=inputs_poly,
    targets=outputs,
    batch_size=22,
    epochs=200,
    optimizer=pk.Adam(learning_rate=0.02, decay_rate=0.99),
    testing_freq=1,
    decay_freq=10
)

# Save model
pk.save(fish_classifier, 'fish_classifier.pkl')

# Plot performance
fish_classifier.plot_performance()

# Print r2 score
print('r2score:', fish_classifier.r2score(inputs_poly, outputs))

Predict length of fish that is 28 days old at 25C

import numpy as np
import pykitml as pk
from pykitml.datasets import fishlength

# Predict length of fish that is 28 days old at 25C

# Load the dataset
inputs, outputs = fishlength.load()

# Load the model
fish_classifier = pk.load('fish_classifier.pkl')

# Normalize inputs
array_min, array_max = pk.get_minmax(inputs)
input_data = pk.normalize_minmax(np.array([28, 25]), array_min, array_max)

# Create plynomial features
input_data_poly = pk.polynomial(input_data)

# Get output
fish_classifier.feed(input_data_poly)
model_output = fish_classifier.get_output()

# Denormalize output
array_min, array_max = pk.get_minmax(outputs)
model_output = pk.denormalize_minmax(model_output, array_min, array_max)

# Print result
print(model_output)

Performance Graph

_images/linear_regression_perf_graph.png